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SYSTEM AND METHOD FOR SPEECH RECOGNITION 
ASSISTED VOICE COMMUNICATIONS 



TECHNICAL FIELD OF THE INVENTION 

The present invention relates generally to voice communications and more 
particularly to a system and method for speech recognition assisted voice 
communications. 

5 
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BACKGROUND OF THE INVENTION 

In the search for low cost, long distance telephone service, the hitemet offers an 
attractive alternative to traditional telephone networks. Through the Internet, users from 
around the world can place Internet protocol (IP) telephone calls without incurring 
additional costs other than those associated with maintaining a connection to the hitemet. 
However, the Internet was not designed for real-time communications, and the underlying 
transport mechanisms of the Internet may resuh in delays and the loss of data. Thus, 
voice communications taking place over the hitemet may suffer serious degradation in 
quahty when packets relaying voice communications are lost or delayed. 
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SUMMARY OF THE INVENTION 

In accordance with the present invention, a system and method for speech 
recognition assisted voice communications is provided which substantially eliminates or 
reduces disadvantages and problems associated with previous systems and methods. In 
5 a particular embodiment, the present invention satisfies a need for a packet-based 

communications session that provides both real-time voice communications and a 
reliable stream of text encoding those voice communications. 

According to one embodiment of the present invention, a method for 
communicating voice and text associated with a packet-based voice communication 

1 0 session establishes the packet-based voice communication session with a remote location, 

receives voice information from a local participant in the packet-based voice 
communication session, and converts the voice information into text. The method 
generates packets encoding the voice information and the text and communicates the 
packets encoding the voice information and the text to the remote location. More 

1 5 specifically, the method generates a first stream of packets encoding the text and a second 

stream of packets encoding the voice information. 

In accordance with another embodiment of the present invention, an interface for 
a telecommunications device receives packets encoding voice information and text from 
a remote location, wherein the voice information and the text are associated with a 

2 0 packet-based voice communication session. The interface displays the text using a visual 

display device and outputs the voice information using an acoustic output device. More 
specifically, the interface may receive local voice information from a local participant in 
the packet-based voice communication session, convert the local voice information into 
local text, and generate packets encoding the local voice information and the local text. 

2 5 The interface may then communicate the packets encoding the local voice information 

and the local text to the remote location. 

The invention provides a number of technical advantages. The system provides 
a method for communicating voice information using a packet-based communications 
network while providing a rehable stream of text encoding the voice communications. 

DALOl: 510815 



ATTORNEY'S DOCKET 
062891.0397 



4 



PATENT APPLICATION 



Each of the participants in the telephone conversation may display a running transcript 
of the conversation. This transcript provides a reference during the conversation, allows 
for more effective communications during periods of low quality voice communications, 
and may be saved to a file for later use. In addition, either the sending or receiving 
5 party's system may convert the text stream into different languages. 

Other technical advantages of the present invention will be readily apparent to one 
skilled in the art from the following figures, descriptions and claims. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

For a more complete understanding of the present invention and the advantages 
thereof, reference is now made to the following descriptions, taken in conjunction with 
the accompanying drawings, in which: 
5 FIGURE 1 is a block diagram illustrating a system having devices supporting 

voice and text communications in accordance with the teachings of the present invention; 

FIGURE 2 is a block diagram illustrating an exemplary communications device 
from the system; 

FIGURE 3 is an exemplary user interface for the communications device; 
1 0 FIGURE 4 is a flowchart illustrating a method for establishing a communications 

session and negotiating voice and text communications; 

FIGURE 5 is a flowchart illustrating a method for processing voice 
communications received from a user; and 

FIGURE 6 is a flowchart illusfrating a method for processing voice and text 
1 5 communications received from a remote communications device. 
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DETAILED DESCRIPTION OF THE INVENTION 

FIGURE 1 illustrates a communications system, indicated generally at 10, that 
includes communications equipment 12 coupled to a communications network 14. In 
general, system 10 provides packet-based voice communications between 
5 communications equipment 12 located at different locations while simultaneously 

providing an underlying text communications stream encoding the voice 
communications. 

Communications equipment 12 includes a computing device 16 and a 
communications interface 18. Communications interface 18 provides input and output 

10 of acoustic signals using any suitable input device, such as a microphone, and any 

suitable output device, such as a speaker. In a particular embodiment, communications 
interface 18 may be a speaker-phone. Computing device 16 represents any hardware 
and/or software that provides an interface between communications equipment 12 and 
communications network 14, processes the receipt and transmission of packet-based 

1 5 voice communications, converts communications between voice and text, displays text, 

and performs other appropriate processing and control functions. For example, 
computing device 16 may be a general purpose computing device such as a laptop or 
desktop computer, a specialized communications device such as an Internet protocol (IP) 
telephone, or other suitable processing or communications device. 

2 0 While computing device 16 and communications interface 18 are shown as 

separate functional units, these components maybe combined into one device or maybe 
separated into smaller functional components when appropriate. In a particular 
embodiment, computing device 1 6 represents a general purpose computer coupled to the 
Internet and running IP telephony commxmications software. Communications interface 

2 5 18 represents a speaker and a microphone coupled to and/or integral with the computer. 

Network 14 represents any collection and arrangement of hardware and/or 
software providing packet-based communications between communications equipment 
12 at different locations. For example, network 14 may be one or a collection of 
components associated with the public-switched telephone network (PSTN), local area 



ATTORNEY'S DOCKET 
062891.0397 



7 



PATENT APPLICATION 



networks (LANs), wide area networks (WANs), a global computer network such as the 
Internet, or other suitable wireline or wireless communications technology that supports 
communications between multiple devices. 

In operation, users of system 10 estabHsh a packet-based voice communications 
session on communications network 14 between communications equipment 12 at 
multiple locations. During setup or at any other appropriate time during the session, the 
participants may establish two communications streams using network 14, a voice hnk 
20 and a text link 22. At each location, a participant in the communications session 
speaks, and computing device 16 receives this local voice information using 
communications interface 18. Computing device 16 encodes the voice information into 
packets and communicates these packets to remote communications equipment 1 2 using 
voice link 20. hi addition, computing device 16, using any suitable speech recognition 
software and/or hardware, converts the voice information into text, encodes the text into 
packets, and communicates these packets to remote communications equipment 1 2 using 
textlink22. This dual-stream session provides voice overpacket (VoP) communications 
while simuhaneously providing a reliable stream of text encoding these voice 
communications. Computing device 16 may use any appropriate speech recognition 
hardware and/or software for converting between voice and text. For example, 
computing device 16 may operate using IP telephony software which contains speech 
recognition capabihties or may interface packet-based communications software with 
commercially available speech recognition software. 

FIGURE 2 is a block diagram illustrating in more detail the functional 
components of communications equipment 12, including the components of both 
computing device 16 and communications interface 18. Communications equipment 12 
includes communications interface 18, a visual display 30, a voice/text module 32, a 
coder/decoder (CODEC) 34, and a network interface 36. Network interface 36 provides 
connectivity between communications equipment 12 and network 14 using any suitable 
wireless or wireline communications protocol. For example, network interface 36 may 
be a computer modem coupled to an Internet service provider (ISP), a wireless network 
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interface device, or other appropriate communications interface. Network interface 36 
transmits and receives packet-based communications using any suitable communications 
protocol, such as Internet protocol (EP), an asynchronous transfer mode (ATM) protocol, 
or other suitable communications protocol. 
5 CODEC 34 encodes voice information received from a microphone 38 into 

packets of data for transmission by network interface 36. CODEC 34 streams packets 
encoding real-time data such as audio or video signals using appropriate parsing, 
compressing, encoding, packetizing, buffering, and processing. In addition, CODEC 34 
decodes packets of information received from network interface 36 into audio signals for 

1 0 output using a speaker 40. Decoding encompasses any steps necessary for receiving a 

real-time stream of data. For example, CODEC 34 may decompress the information 
encoded in the received packets, separate out audio frames, maintain a jitter buffer, and 
provide other suitable processing for extracting audio signals from received packets. 
Microphone 38 and speaker 40 represent suitable input and output devices for receiving 

15 and outputting audio signals from and to users of communications equipment 12. 

Voice/text module 32 includes speech recognition modules capable of converting 
voice information received using microphone 3 8 into text and then encoding the text into 
packets for communication using network interface 36. Alternatively, voice/text module 
32 may receive partially or fully processed voice information from CODEC 34. For 

2 0 example, voice/text module 32 may receive and process information that has been 

digitized by CODEC 34. Voice/text module 32 may also display the text encoding the 
voice information received from microphone 38 using visual display 30, allowing a user 
to view text of his or her spoken words during a conversation. 

In addition, voice/text module 32 receives packets encoding voice information 

2 5 from a remote location, retrieves the remote text information from the packets, and 

displays the remote text information using visual display 30. Thus, communications 
equipment 12 may display a substantially real-time franscript of a voice communications 
session for reference during the conversation, to supplement the voice communications 
during periods of reduced transmission quality, or to save for future reference. This 



ATTORNEY'S DOCKET 
062891.0397 



9 



PATENT APPLICATION 



transcript may include both local and remote voice communications. 

Voice/text module 32 may also provide speech synthesis capabilities. For 
example, voice/text module 32 may receive packets encoding remote voice information 
from network interface 36 and use this remote voice information to generate audio 
5 signals using speaker 40. Moreover, voice/text module 32 may work in conjunction with 

CODEC 34 to supplement poor quality voice communications with synthesized speech. 
In addition, voice/text module 32 may translate text from a first language to a second 
language. For example, text received in English, either from microphone 3 8 or in packets 
from network interface 36, may be translated to any other language for display using 

10 visual display 30 or communication using network interface 36. Furthermore, this 

translation capability may be used in conjunction with speech synthesis to provide a 
franslated audio signal for output using speaker 40. 

Communications equipment 12 also includes a memory 42 storing data and 
software. Memory 42 represents any one or combination of volatile or non- volatile, local 

15 or remote devices suitable for storing data, for example, random access memory (RAM) 

devices, read-only memory (ROM) devices, magnetic storage devices, optical storage 
devices, or any other suitable data storage devices. Memory 42 may store franscripts of 
current and previous communication sessions, communications applications, telephony 
applications, interface applications, speech synthesis applications, speech recognition 

2 0 applications, language translation applications, and other appropriate software and data. 

In operation, a user of communications equipment 12 establishes a packet-based 
voice commimications session with a remote communications device. At any suitable 
time during the communications session, communications equipment 12 may determine 
that the remote device provides for a voice and text communications session. Based on 

2 5 this determination, communications equipment 12 may establish avoice and text session 

with the remote device at any time. Communications equipment 12 may establish the 
voice and text session during the setup of the initial call, when a degradation in the 
quality of the voice communications is detected, or at any other appropriate time. In a 
particular embodiment, a degradation in the quahty of the voice hnk triggers an automatic 
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initialization of the voice-to-text capabilities. Hence, at any suitable time during or in 
setup of a communications session, equipment 12 establishes voice link 20 and text link 
22 with a remote communications device. 

During the voice and text session, communications equipment 12 receives voice 
5 information using microphone 38. The voice information is then encoded into packets 

using CODEC 34, and these packets are transmitted to the remote device on voice link 
20 using network interface 36. Substantially simultaneously, the voice information is 
converted into text and encoded into packets using voice/text module 32, and these 
packets are communicated to the remote location on text link 22 using network interface 

10 36. Thus, cornmunications equipment 12 generates dual cornmunications streams. One 

stream communicates packets encoding the text of the voice conversation, and the other 
stream communicates packets encoding the voice information. Using two different 
streams, the voice and text packets may be assigned different levels of service. However, 
system 10 contemplates combining the two streams such that text information 

1 5 "piggybacks" in the voice packets. Thus the text information may be communicated in 

the same packet as voice information. 

Network interface 36 may communicate packets encoding the voice and text 
information using any suitable transmission protocol. Voice and text information streams 
may be communicated using the same protocol or using different protocols. In a 

2 0 particular embodiment, the two communications streams are transmitted using different 

communications protocols. According to this embodiment, network interface 36 
communicates the packets encoding voice information using a communications protocol 
such as user datagram protocol (UDP) and communicates the packets encoding the text 
using a more reliable communications protocol, such as transmission control protocol 

2 5 (TCP). By using a more reliable communications protocol for the transmission of 

packets encoding text information, the stream o f text information will be maintained even 
during periods of low network quality. These text packets will provide, even if somewhat 
delayed, a virtually guaranteed communications link. 

While the preceding descriptions detail specific functional modules, system 10 
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contemplates implementing each of the components in communications equipment 12 
using any suitable combination and arrangement of hardware and/or software. In 
addition, functionaUties from each of these components maybe combined or separated 
into smaller functional units when appropriate, and any of the methods or functionalities 
described may be implemented by a computer program stored on a computer readable 
medium. 

FIGURE 3 illustrates a graphical user interface 50 for communications equipment 
12. Graphical user interface 50 includes a text display 52, speech synthesis options 54, 
transcript options 56, and translation options 5 8 . In general, communications equipment 
12 presents graphical user interface 50, which displays the text associated with a voice 
communications session and allows for the selection of various options. 

Text display 52 provides both a received text field for the display of text 
information received from a remote location and an outgoing text field for the display of 
text from voice/text module 32 based on microphone 38 input. While text display 52 in 
this example displays incoming and outgoing text in different fields, system 10 
contemplates any appropriate method for displaying text associated with voice 
communications sessions, such as a unified field for the display of all associated text. 
Moreover, text display 52 may display only received text or only outgoing text based on 
the options selected and/or the capabilities of the communicating devices. 

Speech synthesis options 54 toggle ON and OFF speech synthesis, and select 
between a frill synthesis, where speech is synthesized based solely on text 
communications, and a supplement feature, in which voice information is supplemented 
using speech synthesized from the text packets. Transcript options 56 control whether 
a transcript of a communications session is saved to memory 42. In a particular 
embodiment, communications equipment 12 automatically saves a temporary transcript 
of a voice communications session so that a user may decide during or after a 
conversation whether to permanently save the transcript. 

Translation options 58 control the translation of transmitted and received text. 
Thus, a user of communications equipment 12 may select to translate outgoing text 
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before transmission to a remote location or may select to translate text received from the 
remote location. While this example shows a specific list of languages available for 
translation, system 10 contemplates providing translation capabihties to and from any 
languages. In addition, user interface 52 may provide further options for specifying 
5 translation options 58, such as the language used by the remote location. Furthermore, 

translation options 58 may be used in conjunction with speech synthesis options 54 to 
generate translated speech during a communications session. 

The features listed and the arrangement of these features on graphical user 
interface 50 illustrate only a specific example of features that may be supported by 

10 communications equipment 12. System 10 contemplates graphical user interface 50 

containing any combination and arrangement of features for controlhng a voice and text 
commimication session. For example, graphical user interface 50 may also display a 
telephone number pad along with other buttons providing various telephony features, thus 
providing a frilly functional computer-implemented telephone. 

1 5 FIGURE 4 is a flowchart illustrating a method for estabhshing a communications 

session and negotiating voice and text commimications using communications equipment 
12. Communications equipment 12 establishes a corranunications session with a remote 
location at step 70. This session communicates voice information using any suitable 
packet based communications protocol, and may additionally include the communication 

2 0 of data, video, or other information using any suitable transmission protocol. 

Communications equipment 12 determines whether a user has selected text 
enhanced communications at step 72. If so, commimications equipment 12 negotiates 
text link 22 with the remote device at step 82. If the user has not selected a text enhanced 
session, communications equipment 12 determines whether the remote device has 

2 5 requested a text enhanced communication session at step 74. If so, communications 

equipment 12 negotiates text link 22 with the remote device at step 82. If the remote 
device has not requested, and the user has not selected, a text enhanced session, 
communications equipment 12 processes a normal voice communications session with 
the remote device at step 76. 
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During this voice session, communications equipment 12 monitors activity to 
determine whether a condition requiring text enhanced communications has been 
detected at step 78. This includes any suitable condition, such as a degradation in the 
quality of voice communications, a request from the user or the remote device, or any 
5 other suitable triggering event. If no such condition has been detected, communications 

equipment 12 determines whether the session has completed at step 80 and, if not, 
continues processing the voice session step 76. However, if a condition requiring a text 
enhanced session has been detected, communications equipment 12 negotiates text link 
22 with the remote device at step 82. 

1 0 Once an appropriate triggering event has been detected, and communications 

equipment 12 has negotiated text link 22 with the remote device, communications 
equipment 12 processes the voice and text communications session at step 84. This 
processing of voice and text communications is discussed in detail in the following 
flowcharts. Moreover, while this flowchart illustrates specific events that may trigger the 

1 5 initialization of a voice and text communication session, communications equipment 1 2 

may negotiate and establish voice and text communications for any suitable reason. In 
addition, system 10 contemplates using any appropriate method for establishing voice 
and text communications between communications equipment 12 and a remote device. 
FIGURE 5 is a flowchart illustrating a method for processing voice 

2 0 cormnunications using communications equipment 12. Communications equipment 12 

establishes a conmiunications session with a remote location at step 100. This session 
communicates voice information using any suitable packet-based communications 
protocol, and may additionally include the communication of data, video, or other 
information using any suitable transmission protocol. Communications equipment 12 

2 5 monitors input received by microphone 38 and determines whether input of voice 

information has been received at steps 102 and 104 respectively. If communications 
equipment 12 detects no input, monitoring is resumed at step 102. 

Upon detecting input, communications equipment 12 converts the voice 
information received into text at step 106. Communications equipment 12 determines 
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whether local voice display is enabled at step 108. Local voice display may be automatic 
or may be selected by a user through any appropriate interface, such as a DISPLAY 
button 60 on user interface 50. If not enabled, flow skips to step 1 12. If local voice 
display is enabled, communications equipment 12 displays local voice information 
5 received using visual display 30 at step 110. Communications equipment 12 generates 

packets encoding the voice information and packets encoding the text at steps 112 and 
114 respectively. Communications equipment 12 communicates the packets to the 
remote location using network interface 36 at step 116. In addition, the local voice 
communications may optionally be translated into various languages for display and/or 

1 0 transmission to the remote location. 

While this flowchart illustrates an exemplary method for processing voice 
communications using communications equipment 1 2, system 1 0 contemplates using any 
appropriate method for processing voice communications using dual communications 
streams to transmit voice information and text encoding the voice information using a 

15 packet-based protocol. Moreover, system 10 contemplates many of the steps in this 

flowchart taking place simultaneously and/or in different orders than shown. For 
example, packets encoding the voice information may be generated and communicated 
as soon as possible, while a copy of the voice information is converted to text, encoded 
into packets, and then communicated. 

2 0 FIGURE 6 is a flowchart illustrating a method for processing commxmications 

received from a remote participant in a communications session. Communications 
equipment 1 2 establishes a communication session with a remote location using network 
interface 36 at step 150. Communications equipment 12 monitors communications from 
the remote location and determines whether packets have been received at steps 1 52 and 

2 5 154 respectively. If no packets have been received, communications equipment 12 

continues monitoring communications at step 152. If a packet has been received, 
commimications equipment 12 determines whether the packet encodes voice information 
at step 156. 

In a particular embodiment, packets encode both voice and text information. 
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Thus, equipment 12 must extract the different types of information from the packet for 
processing by the appropriate modules. However, in this example, communications 
equipment 12 receives voice and text packets using separate communications streams. 
Therefore, if a voice packet is received, communications equipment 12 outputs the voice 
information as an audio speech signal using speaker 40 at step 170. However, in certain 
circumstances, communications equipment 12 may suppress the output of signals 
received in voice packets. For example, if a user has enabled speech synthesis or a user 
has enabled text to translated speech, communications equipment 12 may disable the 
output of signals from voice packets to prevent conflicting outputs. 

If the packet does not encode voice information, communications equipment 12 
determines whether the packet encodes text at step 158. If not, communications 
equipment 12 may optionally display an error message indicating the receipt of an 
unknown packet at step 160, and return to monitoring communications at step 152. If a 
text packet has been received, communications equipment 12 determines whether text 
display is enabled at step 162. Text display maybe enabled automatically or using any 
suitable user interface, such as a DISPLAY button 62 on user interface 50. If text display 
is not enabled, flow skips to step 166. However, if text display is enabled, 
communications equipment 12 displays the received text using visual display 30 at step 
164. In displaying text, conmiunications equipment 12 may translate the received text 
according to translation options 58 selected on graphical user interface 50. 

Communications equipment 12 determines whether speech synthesis is enabled 
at step 166. Speech synthesis may be enabled using any appropriate interface, such as 
synthesis options 54 on user interface 50. If speech synthesis is not enabled, 
communications equipment 12 resumes monitoring communications at step 152. If 
speech synthesis is enabled, commvmications equipment 12 converts the received text to 
speech and outputs the speech at steps 164 and 170 respectively. This conversion of text 
to speech may also translate according to translation options 58 selected. 

The preceding flowcharts illustrate only exemplary methods for communicating 
dual streams encoding voice and text information based on received voice input, and for 
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processing the receipt of these dual communications streams from a remote location. 
Communications equipment 1 2 contemplates many of the steps in these flowcharts taking 
place simultaneously and/or in different orders than as shown. Furthermore, 
communications equipment 12 contemplates using methods with additional steps, fewer 
5 steps, or different steps, so long as the methods remain appropriate for providing a 

packet-based voice communication session supplemented by a stream of text encoding 
the voice information. 

Although the present invention has been described in several embodiments, a 
myriad of changes and modifications may be suggested to one skilled in the art, and it is 
1 0 intended that the present invention encompass such changes and modifications as fall 

within the scope of the present appended claims. 
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WHAT IS CLAIMED IS: 

1 . A method for communicating voice and text associated with a packet- 
based voice communications session comprising: 

receiving voice information from a local participant in a packet-based voice 
5 communications session; 

converting the voice information into text; 

generating packets encoding the voice information and the text; and 
communicating the packets encoding the voice information and the text to a 
remote location. 



2. The method of Claim 1 , wherein the packet-based voice communications 
session comprises an Internet protocol (IP) telephony communications session. 

3. The method of Claim 1, wherein generating the packets encoding the 
1 5 voice information and the text comprises: 

generating a first stream of packets encoding the text; and 
generating a second stream of packets encoding the voice information. 

4. The method of Claim 3, wherein communicating comprises 
2 0 communicating the first stream of packets using a first Internet protocol (IP) transmission 

protocol and communicating the second stream of packets using a second IP transmission 
protocol. 



5. The method of Claim 4, wherein: 
2 5 the first transmission protocol comprises transmission control protocol (TCP); 

and 

the second transmission protocol comprises user datagram protocol (UDP). 
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6. The method of Claim 1, further comprising displaying the text using a 
visual output device. 

7. The method of Claim 1 , further comprising: 

5 receiving packets encoding remote voice information and remote text from the 

remote location; 

outputting the remote voice information using an acoustic output device; and 
displaying the remote text using a visual output device. 



10 
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8. An interface for a telecommimications device, the interface operable to: 
receive packets encoding voice information and text of the voice information 

from a remote location, wherein the voice information and the text are associated with 
a packet-based voice communications session; 
5 display the text using a visual display device; and 

output the voice information using an acoustic output device. 

9. The interface of Claim 8, wherein the packet-based voice communications 
session comprises an Internet protocol (IP) telephony communications session. 

10 

1 0. The interface of Claim 8, wherein the packets encoding voice information 
and text comprise: 

a first stream of packets encoding voice information from a participant in the 
communications session at the remote location; and 
15 a second stream of packets encoding text generated by converting the voice 

information. 



11. The interface of Claim 10, wherein the first stream of packets is 
communicated using a first Internet protocol (IP) transmission protocol and the second 
2 0 stream of packets is commimicated using a second IP transmission protocol. 



12. The interface of Claim 1 0, wherein: 

the first transmission protocol comprises transmission confrol protocol (TCP); 

and 

2 5 the second transmission protocol comprises user datagram protocol (UDP). 
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1 3 . The interface of Claim 8, further comprising: 

receiving local voice information from a local participant in the packet-based 
voice communications session; 

converting the local voice information into local text; 
5 generating packets encoding the local voice information and the local text; and 

communicating the packets encoding the local voice information and the local 
text to the remote location; 

14. The interface of Claim 8, wherein the interface comprises a computer 
1 0 program embodied in a computer readable medium. 

1 5 . The interface of Claim 8, further operable to output the voice information 
using speech synthesis to convert the text into an audio output. 

15 16. The interface of Claim 8, further operable to translate the text fi-om a first 

language to a second language. 
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17. Telephony communications software for communicating voice and text 
associated with a packet-based voice communications session, the software embodied in 
a computer readable medium and operable to: 

establish the packet-based voice communications session with a remote location; 
5 receive voice information from a local participant in the packet-based voice 

communications session; 

convert the voice information into text; 

generate packets encoding the voice information and the text; 

communicate the packets encoding the voice information and the text to the 



15 



18. The software of Claim 17, wherein the packet-based voice 
communications session comprises an Internet protocol (IP) telephony communications 
session. 

19. The software of Claim 17, further operable to: 
generate a first stream of packets encoding the text; and 

generate a second stream of packets encoding the voice information. 

20. The software of Claim 19, further operable to: 

communicate the first stream of packets using a first Internet protocol (IP) 
transmission protocol; and 

conraiunicate the second stream of packets using a second IP transmission 
protocol. 
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21. The software of Claim 20, wherein: 

the first transmission protocol comprises transmission control protocol (TCP); 

and 

the second transmission protocol comprises user datagram protocol (UDP). 

22. The software of Claim 17, further operable to display the text using a 
visual output device. 

23. The software of Claim 17, fiirther operable to: 

receive packets encoding remote voice information and remote text from the 
remote location; 

output the remote voice information using an acoustic output device; and 
display the remote text using a visual output device. 
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24. A communications system for communicating voice and text associated 
with a packet-based voice communications session comprising: 

a first communications device operable to establish the communications session 
with a second communications device, to receive voice information from a local 
participant in the communications session, convert the voice information into text, 
generate packets encoding the voice information and the text, and communicate the 
packets to the second communications device; and 

the second communications device operable to receive the packets from the first 
communications device, display the text using a visual display device, and output the 
voice information using an acoustic output device. 

25. The communications system of Claim 24, wherein the first 
communications device is further operable to: 

generate a first stream of packets encoding the text; and 

generate a second stream of packets encoding the voice information. 

26. The communications system of Claim 25, further operable to: 
communicate the first stream of packets using a first Internet protocol (IP) 

transmission protocol; and 

communicate the second stream of packets using a second IP transmission 
protocol. 

27. The communications system of Claim 26, wherein: 

the first transmission protocol comprises transmission control protocol (TCP); 

and 

the second transmission protocol comprises user datagram protocol (UDP). 
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28. The communications system of Claim 24, wherein the second 
communications device is further operable to translate the text from a first language to 
a second language. 

29. The communications system of Claim 24, wherein the second 
communications device is further operable to: 

generate an audio speech signal using the text; and 

output the audio speech signal using the acoustic output device. 

30. The communications system of Claim 24, wherein the commxmications 
session comprises a voice over packet (VoP) telephone call. 
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31. A device for communicating voice and text associated with a packet-based 
voice communications session comprising: 

means for receiving voice information from a local participant in a packet-based 
voice communications session; 

means for converting the voice information into text; 

means for generating packets encoding the voice information and the text; and 
means for communicating the packets encoding the voice information and the text 
to a remote location. 

32. The device of Claim 3 1 , wherein the packet-based voice communications 
session comprises an Internet protocol (IP) telephony communications session. 

33. The device of Claim 31, wherein the means for generating the packets 
encoding the voice information and the text comprises: 

means for generating a first stream of packets encoding the text; and 

means for generating a second stream of packets encoding the voice information. 

34. The device of Claim 33, wherein the means for communicating comprises 
means for communicating the first stream of packets using a first Internet protocol (IP) 
transmission protocol and means for communicating the second stream of packets using 
a second IP transmission protocol. 

35. The device of Claim 34, wherein: 

the first transmission protocol comprises transmission control protocol (TCP); 

and 

the second transmission protocol comprises user datagram protocol (UDP). 
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3 6 . The device o f Claim 3 1 , further comprising means for displaying the text 
using a visual output device. 

3 7 . The device of Claim 3 1 , further comprising : 
5 means for receiving packets encoding remote voice information and remote text 

from the remote location; 

means for outputting the remote voice information using an acoustic output 
device; and 

means for displaying the remote text using a visual output device. 
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SYSTEM AND METHOD FOR SPEECH RECOGNITION 
ASSISTED VOICE COMMUNICATIONS 

ABSTRACT OF THE DISCLOSURE 
5 A communication system includes communications equipment having a voice 

input device, an acoustic output device, and a visual display device. The commimications 
equipment receives voice information from a user using the voice input device, converts 
the voice information into text, and communicates packets encoding the voice 
information and the text to a remote location. The communications equipment also 
1 0 receives packets encoding voice and text information from the remote location, outputs 

the voice information using the acoustic output device, and outputs the text information 
using the visual display device. 
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